Seeing to hear better: evidence for early audio-visual interactions in speech identification.

نویسندگان

  • Jean-Luc Schwartz
  • Frédéric Berthommier
  • Christophe Savariaux
چکیده

Lip reading is the ability to partially understand speech by looking at the speaker's lips. It improves the intelligibility of speech in noise when audio-visual perception is compared with audio-only perception. A recent set of experiments showed that seeing the speaker's lips also enhances sensitivity to acoustic information, decreasing the auditory detection threshold of speech embedded in noise [J. Acoust. Soc. Am. 109 (2001) 2272; J. Acoust. Soc. Am. 108 (2000) 1197]. However, detection is different from comprehension, and it remains to be seen whether improved sensitivity also results in an intelligibility gain in audio-visual speech perception. In this work, we use an original paradigm to show that seeing the speaker's lips enables the listener to hear better and hence to understand better. The audio-visual stimuli used here could not be differentiated by lip reading per se since they contained exactly the same lip gesture matched with different compatible speech sounds. Nevertheless, the noise-masked stimuli were more intelligible in the audio-visual condition than in the audio-only condition due to the contribution of visual information to the extraction of acoustic cues. Replacing the lip gesture by a non-speech visual input with exactly the same time course, providing the same temporal cues for extraction, removed the intelligibility benefit. This early contribution to audio-visual speech identification is discussed in relationships with recent neurophysiological data on audio-visual perception.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Modeling audio-visual speech perception: back on fusion architectures and fusion control

In a review paper about audio-visual (AV) fusion models in speech perception, we (Schwartz et al., 1998) proposed a taxonomy of models around two basic questions: architecture and control. Six years after, it appears that the proposals we made still seem rather convenient for discussing major questions about AV fusion. Moreover – and more importantly – recent experimental and theoretical progre...

متن کامل

When and Why Feedback Matters in the Perceptual Learning of Visual Properties of Speech

This study investigated the effects of feedback on the perception of words in point-light displays of speech. Participants attempted to identify individual words that they could see (but not hear) being spoken. After attempting to identify the word in each visual-only stimulus, the participants were informed of the identity of the word by receiving feedback in one of three forms: seeing the vis...

متن کامل

Combining techniques to reveal emergent effects in infants' segmentation, word learning, and grammar.

This paper provides three representative examples that highlight the ways in which procedures can be combined to study interactions across traditional domains of study: segmentation, word learning, and grammar. The first section uses visual familiarization prior to the Headturn Preference Procedure to demonstrate that synchronized visual information aids in speech segmentation in noise. The sec...

متن کامل

Developing an audio-visual speech source separation algorithm

Looking at the speaker s face is useful to hear better a speech signal and extract it from competing sources before identification. This might result in elaborating new speech enhancement or extraction techniques exploiting the audiovisual coherence of speech stimuli. In this paper, a novel algorithm plugging audio-visual coherence estimated by statistical tools on classical blind source separa...

متن کامل

Further experiments on audio-visual speech source separation

Looking at the speaker’s face seems useful to better hear a speech signal and extract it from competing sources before identification. This might result in elaborating new speech enhancement or extraction techniques exploiting the audio-visual coherence of speech stimuli. In this paper, we present a set of experiments on a novel algorithm plugging audio-visual coherence estimated by statistical...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Cognition

دوره 93 2  شماره 

صفحات  -

تاریخ انتشار 2004